Policy Iteration Algorithm for Shortest Path Problems

نویسنده

  • STÉPHANE GAUBERT
چکیده

Abstract. The shortest paths tree problem consists in finding a spanning tree rooted at a given node, in a directed weighted graph, such that for each node i , the path of the tree which goes from i to the root has minimal weight. We propose an algorithm which is a deterministic version of Howard’s policy iteration scheme. We show that policy iteration is faster than the Bellman (or value iteration) algorithm. In particular, the worst case execution time of policy iteration is O(nm), where n is the number of nodes, and m is the number of arcs. Policy iteration finds rapidly a circuit of negative weight when there is one.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LIDS REPORT 2871 1 Q - Learning and Policy Iteration Algorithms for Stochastic Shortest Path Problems ∗

We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in [BY10b]. The main difference from the s...

متن کامل

Simulation-Based Algorithms for Average Cost Markov Decision Processes

In this paper, we give a summary of recent development of simulation-based algorithms for average cost MDP problems, which are different from those for discounted cost problems or shortest path problems. We introduce both simulation-based policy iteration algorithms and simulation-based value iteration algorithms for average cost problem, and give the pros and cons of each algorithm.

متن کامل

Q-learning and policy iteration algorithms for stochastic shortest path problems

We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in Bertsekas and Yu (Math. Oper. Res. 37(1...

متن کامل

Infinite-Space Shortest Path Problems and Semicontractive Dynamic Programming†

In this paper we consider deterministic and stochastic shortest path problems with an infinite, possibly uncountable, number of states. The objective is to reach or approach a special destination state through a minimum cost path. We use an optimal control problem formulation, under assumptions that parallel those for finite-node shortest path problems, i.e., there exists a path to the destinat...

متن کامل

Efficient Bounds in Heuristic Search Algorithms for Stochastic Shortest Path Problems

Fully observable decision-theoretic planning problems are commonly modeled as stochastic shortest path (SSP) problems. For this class of planning problems, heuristic search algorithms (including LAO*, RTDP, and related algorithms), as well as the value iteration algorithm on which they are based, lack an efficient test for convergence to an -optimal policy (except in the special case of discoun...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007